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Adaptive  Information  Filtering 
Abstract 

Yi  Zhang,  University  of  California  Santa  Cruz 


This  project  studies  personalized  proactive  information  filtering  agents  that  pushes  relevant 
information  to  the  user  without  requiring  explicit  user  query.  To  do  this,  the  agent  adaptively 
learns  a  detailed  user  model  while  observing  and  interacting  with  the  user.  We  use  Bayesian 
statistical  theory  and  machine  learning  techniques  to  tackle  the  following  two  major  challenges. 
We  studied  two  major  problems:  how  to  build  an  initial  user  profile  with  minimal  user  effort, 
and  How  to  improve  personalized  recommendation  based  on  multiple  evidences,  such  as  social 
networks,  implicit  user  feedback,  and  explicit  user  feedback  and  context  information.  The  research 
has  be  evaluated  on  TREC  data,  data  the  PI  has  collected  through  a  user  study,  and  data  collected 
from  digg.com,  Citeseer,  del.io.us..  This  project  led  to  1  book  chapter,  2  journal  paper,  4  refereed 
conference  publications,  1  un-refereed  publication  and  one  demo  system. 
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Figure  1:  Architecture  for  a  Adaptive  Filtering  System  used  by  an  information  agent  for  tracking 
information  related  to  potential  terrorism. 

PROJECT  REPORT 

Problem  1:  To  help  avoid  unanticipated  security  events  such  as  911  from  happening,  agents 
working  on  homeland  security  need  to  read  a  large  amount  of  information  from  heterogeneous 
information  resources  (reports  from  other  agents,  email,  news  pages,  blog  space,  internet  discus¬ 
sion  groups  etc.).  Unfortunately,  it  is  impossible  to  read  all  documents  from  massive  incoming 
data  streams.  The  nation  is  at  great  risk  due  to  the  biases  of  individual  agents  and  misses  of 
information  related  to  potential  terrorism  attacks. 

Problem  2:  Since  2005,  more  than  54  millions  Americans  were  affected  because  of  reported 
information  leaking  cases  (Labs,  2006).  This  number  only  includes  publicly  reported  case  and 
possibly  worse  case  where  an  organization  like  FBI  does  not  even  know  whether  or  what  sen¬ 
sitive  information  were  leaked  out  (New_York_Times,  2007)(Wasington_Post,  2007)(USA_Today, 
2007)(Associated_Press,  2007). 

Solution:  Adaptive  information  filtering  is  a  solution  to  solve  the  above  problems.  An 
information  filtering  system  monitors  document  streams  to  find  the  documents  that  match  a 
user  profile.  As  the  system  runs,  an  adaptive  filtering  system  frequently  updates  the  user  profile 
based  on  observations  of  the  document  streams,  explicit  and  implicit  user  feedback. 

Figure  1  is  a  rough  architecture  of  the  filtering  system.  When  it  is  initially  launched  for  a  new 
user,  the  user  uses  the  interactive  feedback  interface  to  provide  relevance  feedback  by  identifying 
examples  of  relevant  and  non  relevant  information.  From  the  training  data,  the  system  learns 
a  profile.  The  profile  is  used  to  monitor  incoming  document  streams  and  identify  new  relevant 
information,  which  will  be  put  into  a  Relevant  Information  queue  for  the  user  to  read. 

Research  Contribution:  This  project  solved  two  major  challenges  in  an  adaptive  filtering 
system:  First,  how  to  build  an  initial  user  profile  with  minimal  user  effort.  Second,  how  to  improve 
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personalized  recommendation  based  on  multiple  evidences,  such  as  social  networks,  implicit  user 
feedback,  and  explicit  user  feedback  and  context  information.  More  specifically,  the  contributions 
are: 

1.  We  studied  how  to  develop  complex  data  driven  user  models  that  go  beyond  the  bag  of 
words  model  using  the  graphical  modeling  approach  (Zhang,  2008).  Experiments  on  data 
collected  from  a  news  filtering  user  study,  as  well  as  another  data  set,  demonstrated  that 
the  graphical  modeling  approach  helps  us  to  better  understand  the  complex  domain  and 
improve  the  adaptive  information  filtering  performance. 

2.  We  developed  a  Bayesian  hierarchical  modeling  approach,  which  we  call  Discriminative  Fac¬ 
tored  Prior  Models  (DFPM),  for  information  filtering  (Zhang  &  Zhang,  2010a).  Compared 
with  existing  approaches,  this  approach  can  1)  borrow  discriminative  criteria  of  other  users 
while  learning  a  particular  user  profile  through  a  factored  prior;  2)  trade  off  well  between 
diversity  and  commonality  among  users;  and  3)  handle  the  challenging  classification  situa¬ 
tion  where  each  class  contains  multiple  concepts.  Experimental  results  on  digg.com  users 
show  that  our  models  significantly  outperform  the  baseline  models  of  L-2  regularized  logistic 
regression  and  the  standard  Bayesian  hierarchical  logistic  regression  models. 

3.  We  also  studied  how  to  do  recommendation  with  networked  data  from  Internet,  social 
Networks,  scientific  paper  citations,  etc  (Chi  et  al.,  2009).  We  first  identify  four  main  data 
dimensions  that  are  common  in  most  of  networked  data,  namely  people,  relation,  content, 
and  time.  We  propose  a  polyadic  factorization  approach  to  directly  model  all  the  dimensions 
simultaneously  and  an  efficient  implementation  of  the  algorithm  that  takes  advantage  of  the 
sparseness  of  data  and  has  time  complexity  linear  in  the  number  of  data  records  in  a  dataset. 
Applying  the  technique  on  blogosphere  and  personalized  recommendation  in  paper  citations, 
we  found  that  the  framework  is  able  to  provide  deep  insights  jointed  obtained  from  various 
dimensions  of  networked  data. 

4.  We  participate  in  TREC  2009  relevance  feedback  track  (Zhang  et  al .,  2009).  We  try  clus¬ 
tering  and  Transductive  Experimental  Design  (TED)  methods  to  automatically  find  good 
documents  for  user  to  provide  relevance  feedback.  We  do  query  expansion  based  on  a  rele¬ 
vance  language  model  learnt  on  the  labeled  relevant  documents.  Our  retrieval  results  after 
relevance  feedback  is  ranked  2nd  among  all  participants. 

5.  Standard  information  retrieval  models  usually  focus  on  relevancy,  without  considering  other 
criteria  (cost,  readability,  novelty  and  recency  etc.).  We  research  multi-criteria  decision 
analysis  for  information  retrieval.  We  found  using  multiple  user-centric  criteria  always 
produced  better  results  than  a  single  criterion,  and  we  also  found  non  linear  interaction 
among  criteria  (Wolfe  &  Zhang,  2009)  (Wolf  &  Zhang,  2010). 

6.  Motivated  by  the  commonly  used  faceted  search  interface  in  e-commerce,  we  investigated 
interactive  user  profile  learning  mechanisms  for  personalized  filtering  and  personalized  re¬ 
trieval  based  on  faceted  document  metadata  (Zhang  &  Zhang,  2010b).  Experiments  on  user 
feedback  collected  through  Amazon  Mechanical  Turk  show  that  the  widely  used  Boolean  fil¬ 
tering  approach  doesn’t  work  well  for  text  document  retrieval,  due  to  the  incompleteness  of 
metadata  assignment  in  semi-structured  text  documents.  Instead,  a  soft  model  we  proposed 
performs  more  effectively  for  personalized  retrieval. 

7.  Since  there  is  no  good  teaching/tutorial  material  on  adaptive  information  filtering,  we  wrote 
a  book  chapter  on  it  (Zhang,  n.d.). 
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This  project  results  a  demo  filtering  system  that  can  filter  many  RSS  feeds  (including  twitter 
feeds)  to  prevent  critical  information  from  being  ignored,  which  is  a  serious  problem  for  many 
companies,  government  agencies  and  individuals.  The  research  results  have  be  disseminated  widely 
through  high-quality  academic  journals  (eg.  Journal  of  Information  Processing  and  Management, 
IEEE  Transactions  on  Multimedia)  and  international  conferences  (eg  SIGIR,  CIKM,  and  TREC). 
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